Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler-v2] Improved flush writes optimization #15718

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

vineethk
Copy link
Contributor

@vineethk vineethk commented Jan 13, 2025

Description

Please note that we should not merge this PR until it undergoes comparison testing.

This PR contains a couple of optimizations, that together bring about 0.5% reduction in code size when compiling the aptos-framework and its dependencies (and compiler v2 produces 4% less code compared to compiler v1). They also address some transactions where v2 generated code took more gas than v1 generated code.

  1. We now do peephole optimizations even when inline specs are present. This is accomplished by remapping the code offsets to spec mapping based on the remapping returned by the peephole optimizer.
  2. We consider the usages of temps within an instruction and usages of temps between different instructions. If these usages are such that their definitions are out of order, then we hint at flushing those temps. Note that the flush writes optimization only suggests which writes should be flushed to locals right away, so in the worst case, it would produce additional stores and copy/moves.

As an example, for the Move code below:

fun test() {
    let a = one();
    consume(one(), a, one(), one(), one(), one());
}

Compiler v2 currently produces:

0: Call one(): u64
1: StLoc[0](loc0: u64)
2: Call one(): u64
3: Call one(): u64
4: Call one(): u64
5: Call one(): u64
6: Call one(): u64
7: StLoc[1](loc1: u64)
8: StLoc[2](loc2: u64)
9: StLoc[3](loc3: u64)
10: StLoc[4](loc4: u64)
11: MoveLoc[0](loc0: u64)
12: MoveLoc[4](loc4: u64)
13: MoveLoc[3](loc3: u64)
14: MoveLoc[2](loc2: u64)
15: MoveLoc[1](loc1: u64)
16: Call consume(u64, u64, u64, u64, u64, u64)
17: Ret

With this PR, compiler v2 produces:

0: Call one(): u64
1: StLoc[0](loc0: u64)
2: Call one(): u64
3: MoveLoc[0](loc0: u64)
4: Call one(): u64
5: Call one(): u64
6: Call one(): u64
7: Call one(): u64
8: Call consume(u64, u64, u64, u64, u64, u64)
9: Ret

How Has This Been Tested?

  • Several new tests.
  • Existing tests baselines have been checked and verified. We either produce less code or the same amount (except in the case of third_party/move/move-compiler-v2/tests/variable-coalescing/swap.opt.exp where we produce 2 additional instructions).

Key Areas to Review

  • The specification remapping due to peephole optimization
  • Flush writes optimization and the correctness argument above in point 2.

Type of Change

  • Performance improvement

Which Components or Systems Does This Change Impact?

  • Move Compiler

Copy link

trunk-io bot commented Jan 13, 2025

⏱️ 48m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-targeted-unit-tests 21m 🟩
rust-cargo-deny 11m 🟩🟩🟩🟩🟩 (+1 more)
check-dynamic-deps 9m 🟩🟩🟩🟩🟩 (+1 more)
general-lints 3m 🟩🟩🟩🟩🟩 (+1 more)
semgrep/ci 2m 🟩🟩🟩🟩🟩 (+1 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+1 more)
permission-check 16s 🟩🟩🟩🟩🟩 (+1 more)
permission-check 14s 🟩🟩🟩🟩🟩 (+1 more)
check-branch-prefix 1s 🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
check-dynamic-deps 3m 2m +80%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

vineethk commented Jan 13, 2025

@vineethk vineethk changed the title [compiler-v2] Experimental flush writes, second take [compiler-v2] Improved flush writes optimization Jan 13, 2025
@vineethk vineethk force-pushed the vk/experimental-flush-writes-2 branch 4 times, most recently from 1603bb7 to dedeb67 Compare January 13, 2025 21:14
@vineethk vineethk marked this pull request as ready for review January 13, 2025 21:33
@vineethk vineethk force-pushed the vk/experimental-flush-writes-2 branch from dedeb67 to f64e874 Compare January 14, 2025 16:36
@vineethk vineethk requested a review from wrwg January 14, 2025 16:38
@vineethk
Copy link
Contributor Author

@wrwg I believe I have addressed your comments, PTAL.

Copy link
Contributor

@wrwg wrwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, as far as I understand.

I have a general question: what could go wrong if the flush processor has a bug? Is my understanding correct that the code would be weird (unnecessary flushing, for example), but it would not have a semantical effect?

23: MoveLoc[2](loc1: &mut u64)
24: WriteRef
25: Ret
11: LdU64(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example where optimization makes the code size bigger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this and swap.opt.exp are the two minor regressions.

@vineethk
Copy link
Contributor Author

LGTM, as far as I understand.

I have a general question: what could go wrong if the flush processor has a bug? Is my understanding correct that the code would be weird (unnecessary flushing, for example), but it would not have a semantical effect?

@wrwg Yes, like I mentioned in the PR description, in the worst case (say, due to a bug), we mark additional/fewer temps to be flushed, then it can result in more StLoc-copy/moves than necessary, but the overall semantics itself should not change. The flush marks added by this pass can only cause a temp added to a stack to be flushed to a local right away, instead of retaining it on the stack.

@vineethk vineethk force-pushed the vk/experimental-flush-writes-2 branch from f64e874 to 14244c9 Compare January 22, 2025 01:56
@vineethk vineethk force-pushed the vk/experimental-flush-writes-2 branch from 14244c9 to 0c69be6 Compare January 28, 2025 15:54
@vineethk vineethk enabled auto-merge (squash) January 28, 2025 16:12

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@vineethk vineethk force-pushed the vk/experimental-flush-writes-2 branch from 0c69be6 to 45f6387 Compare January 29, 2025 16:52

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea

Compatibility test results for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea (PR)
1. Check liveness of validators at old version: 55909aa300b714e37df1113a239332c334cb2896
compatibility::simple-validator-upgrade::liveness-check : committed: 12489.70 txn/s, latency: 2491.18 ms, (p50: 2500 ms, p70: 2600, p90: 2800 ms, p99: 3100 ms), latency samples: 411480
2. Upgrading first Validator to new version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 4539.24 txn/s, latency: 6875.18 ms, (p50: 7500 ms, p70: 7900, p90: 8600 ms, p99: 8800 ms), latency samples: 93500
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4366.25 txn/s, latency: 7770.82 ms, (p50: 8700 ms, p70: 8700, p90: 8900 ms, p99: 9100 ms), latency samples: 153080
3. Upgrading rest of first batch to new version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 4140.31 txn/s, latency: 7548.45 ms, (p50: 8300 ms, p70: 8900, p90: 9400 ms, p99: 9500 ms), latency samples: 90500
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4153.88 txn/s, latency: 8190.35 ms, (p50: 9200 ms, p70: 9400, p90: 9600 ms, p99: 9700 ms), latency samples: 146020
4. upgrading second batch to new version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 7907.16 txn/s, latency: 3892.16 ms, (p50: 4400 ms, p70: 4900, p90: 5200 ms, p99: 5200 ms), latency samples: 145380
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 7902.21 txn/s, latency: 4327.74 ms, (p50: 4700 ms, p70: 4800, p90: 5000 ms, p99: 5200 ms), latency samples: 264100
5. check swarm health
Compatibility test for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 45f6387456d8a64717faf5c83f5d79b9b32010ea

two traffics test: inner traffic : committed: 14776.01 txn/s, submitted: 14776.06 txn/s, expired: 0.05 txn/s, latency: 2681.30 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3600 ms), latency samples: 5618280
two traffics test : committed: 99.97 txn/s, latency: 1414.99 ms, (p50: 1400 ms, p70: 1400, p90: 1600 ms, p99: 2700 ms), latency samples: 1760
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.501, avg: 1.413", "ConsensusProposalToOrdered: max: 0.303, avg: 0.292", "ConsensusOrderedToCommit: max: 0.412, avg: 0.397", "ConsensusProposalToCommit: max: 0.708, avg: 0.688"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.75s no progress at version 2974571 (avg 0.19s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.62s no progress at version 2378301 (avg 0.62s) [limit 16].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea

Compatibility test results for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea (PR)
Upgrade the nodes to version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1616.90 txn/s, submitted: 1622.26 txn/s, failed submission: 5.36 txn/s, expired: 5.36 txn/s, latency: 1794.48 ms, (p50: 1600 ms, p70: 2000, p90: 2700 ms, p99: 3600 ms), latency samples: 144921
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1572.03 txn/s, submitted: 1578.65 txn/s, failed submission: 6.62 txn/s, expired: 6.62 txn/s, latency: 1779.39 ms, (p50: 1800 ms, p70: 2000, p90: 2600 ms, p99: 3900 ms), latency samples: 142460
5. check swarm health
Compatibility test for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea passed
Upgrade the remaining nodes to version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1850.35 txn/s, submitted: 1857.11 txn/s, failed submission: 6.41 txn/s, expired: 6.76 txn/s, latency: 1574.59 ms, (p50: 1400 ms, p70: 1800, p90: 2500 ms, p99: 3400 ms), latency samples: 167548
Test Ok

@vineethk vineethk merged commit dfc3444 into main Jan 29, 2025
46 checks passed
@vineethk vineethk deleted the vk/experimental-flush-writes-2 branch January 29, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants